[gpt-oss] use vLLM instead of openai types for streaming by qandrew · Pull Request #25186 · vllm-project/vllm

qandrew · 2025-09-18T17:00:46Z

Purpose

with changes such as [Frontend] Responses API messages out, just harmony for now #24985, vLLM types are diverging from the openai-python types. We'll need to send the extra payload of our vLLM types in streaming, this PR fixes such
for the future? move streaming to a separate protocol.py file

Test Plan

Test Result

client

curl http://localhost:20001/v1/responses   -H "Content-Type: application/json"   -N   -d '{
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "input": [
        {
            "role": "user",
            "content": "Hello."
        }
    ],
    "temperature": 0.7,
    "max_output_tokens": 256,
    "stream": true, 
    "enable_response_messages": true
}'

ResponseCompletedEvent output (see that the input/output messages are there)

{
    "response": {
        "id": "resp_aed05365be36434fb820a6eab52bfc39",
        "created_at": 1758328028,
        "incomplete_details": null,
        "instructions": null,
        "metadata": null,
        "model": "/data/users/axia/checkpoints/gpt-oss-120b",
        "object": "response",
        "output": [
            {
                "id": "rs_2ebdf86c76a54ddb85c5a867c31f9b77",
                "summary": [],
                "type": "reasoning",
                "content": [
                    {
                        "text": "The user says \"Hello.\" Probably a greeting. Should respond politely, maybe ask how can help. Keep brief friendly.",
                        "type": "reasoning_text"
                    }
                ],
                "encrypted_content": null,
                "status": null
            },
            {
                "id": "msg_30edf4e235b546078b529ebfaa6d7bee",
                "content": [
                    {
                        "annotations": [],
                        "text": "Hello! How can I assist you today?",
                        "type": "output_text",
                        "logprobs": null
                    }
                ],
                "role": "assistant",
                "status": "completed",
                "type": "message"
            }
        ],
        "input_messages": [
            {
                "author": {
                    "role": "system",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": null,
                "recipient": null,
                "content_type": null
            },
            {
                "author": {
                    "role": "user",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": null,
                "recipient": null,
                "content_type": null
            }
        ],
        "output_messages": [
            {
                "author": {
                    "role": "assistant",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": "analysis",
                "recipient": null,
                "content_type": null
            },
            {
                "author": {
                    "role": "assistant",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": "final",
                "recipient": null,
                "content_type": null
            }
        ],
        "parallel_tool_calls": true,
        "temperature": 0.7,
        "tool_choice": "auto",
        "tools": [],
        "top_p": 1.0,
        "background": false,
        "max_output_tokens": 256,
        "max_tool_calls": null,
        "previous_response_id": null,
        "prompt": null,
        "reasoning": null,
        "service_tier": "auto",
        "status": "completed",
        "text": null,
        "top_logprobs": null,
        "truncation": "disabled",
        "usage": {
            "input_tokens": 67,
            "input_tokens_details": {
                "cached_tokens": 64
            },
            "output_tokens": 43,
            "output_tokens_details": {
                "reasoning_tokens": 25,
                "tool_output_tokens": 0
            },
            "total_tokens": 110
        },
        "user": null
    },
    "sequence_number": 45,
    "type": "response.completed"
}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2025-09-19T19:22:22Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @qandrew.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Andrew Xia <axia@meta.com>

qandrew · 2025-09-21T04:51:09Z

cc @alecsolder , @yeqcharlotte , @chaunceyjiang ready for review. i think the readthedocs CI failure is a flake

lacora · 2025-09-22T19:19:45Z

maybe we need to think about whether allowing input_messages / output_messages for sync responses only or for streaming as well, appending long messages kills the efficiency that streaming provides cc @yeqcharlotte

qandrew · 2025-09-22T20:02:02Z

maybe we need to think about whether allowing input_messages / output_messages for sync responses only or for streaming as well, appending long messages kills the efficiency that streaming provides cc @yeqcharlotte

that's a good point! this PR only modifies events in streaming right now (ResponseCreatedEvent, ResponseInProgressEvent, ResponseCompletedEvent), so none of the deltas / intermediate events will not get additional payload. Or i could remove from ResponseCreatedEvent, ResponseInProgressEvent but maybe it's fine to keep it for now for consistency?

yeqcharlotte

LGTM. could you add a basic skeleton tests that load the type. cc:
@chaunceyjiang @aarnphm PTAL too thanks!

yeqcharlotte · 2025-09-24T04:19:10Z

+                                    ResponseWebSearchCallCompletedEvent,
+                                    ResponseWebSearchCallInProgressEvent,
+                                    ResponseWebSearchCallSearchingEvent)
+# yapf: enable


is the format change intentional

yep, yapf/isort had a conflict, it seems like it happens pretty often...

can we keep the original format?

@yeqcharlotte we can't. The introduction of below makes the conflict, i tried to but there will be conflicts. You can see in serving_responses.py below # yapf conflicts with isort for this block happens often

from openai.types.responses import ( ResponseInProgressEvent as OpenAIResponseInProgressEvent)

Signed-off-by: Andrew Xia <axia@meta.com>

yeqcharlotte

LGTM! thanks for the change!

Signed-off-by: Andrew Xia <axia@fb.com>

…t#25186) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>

Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

…t#25186) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…t#25186) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>

qandrew changed the title ~~[gptoss] use vLLM streaming types~~ [gpt-oss] use vLLM streaming types Sep 18, 2025

mergify Bot added frontend gpt-oss Related to GPT-OSS models labels Sep 18, 2025

github-project-automation Bot added this to gpt-oss Issues & Enhancements Sep 18, 2025

github-project-automation Bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 18, 2025

mergify Bot added the needs-rebase label Sep 19, 2025

initial commit, vllm ResponsesCompletedEvent type

7b82dbf

Signed-off-by: Andrew Xia <axia@meta.com>

qandrew force-pushed the gpt-oss-streaming-3-wrap-types branch from 0c32be0 to 7b82dbf Compare September 20, 2025 00:18

mergify Bot removed the needs-rebase label Sep 20, 2025

more types

64bb6d7

Signed-off-by: Andrew Xia <axia@meta.com>

qandrew changed the title ~~[gpt-oss] use vLLM streaming types~~ [gpt-oss] use vLLM instead of openai types for streaming Sep 20, 2025

qandrew marked this pull request as ready for review September 21, 2025 04:50

qandrew requested review from aarnphm and chaunceyjiang as code owners September 21, 2025 04:50

yeqcharlotte reviewed Sep 24, 2025

View reviewed changes

qandrew added 2 commits September 25, 2025 17:08

Merge branch 'main' into gpt-oss-streaming-3-wrap-types

a782f96

ut

f21a45d

Signed-off-by: Andrew Xia <axia@meta.com>

qandrew requested review from DarkLight1337, NickLucche, robertgshaw2-redhat and simon-mo as code owners September 26, 2025 00:17

yeqcharlotte approved these changes Sep 27, 2025

View reviewed changes

github-project-automation Bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 27, 2025

yeqcharlotte enabled auto-merge (squash) September 27, 2025 00:10

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 27, 2025

Merge branch 'main' into gpt-oss-streaming-3-wrap-types

658048b

read docs fix

68d99ec

Signed-off-by: Andrew Xia <axia@fb.com>

auto-merge was automatically disabled September 30, 2025 20:51
Head branch was pushed to by a user without write access

hmellor enabled auto-merge (squash) September 30, 2025 22:09

hmellor merged commit 5db1870 into vllm-project:main Sep 30, 2025
45 checks passed

github-project-automation Bot moved this from Ready to Done in gpt-oss Issues & Enhancements Sep 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[gpt-oss] use vLLM instead of openai types for streaming#25186

[gpt-oss] use vLLM instead of openai types for streaming#25186
hmellor merged 6 commits intovllm-project:mainfrom
qandrew:gpt-oss-streaming-3-wrap-types

qandrew commented Sep 18, 2025 •

edited by github-actions Bot

Loading

Uh oh!

mergify Bot commented Sep 19, 2025

Uh oh!

qandrew commented Sep 21, 2025

Uh oh!

lacora commented Sep 22, 2025

Uh oh!

qandrew commented Sep 22, 2025

Uh oh!

yeqcharlotte left a comment

Uh oh!

yeqcharlotte Sep 24, 2025

Uh oh!

qandrew Sep 26, 2025

Uh oh!

yeqcharlotte Sep 26, 2025

Uh oh!

qandrew Sep 26, 2025

Uh oh!

yeqcharlotte left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

qandrew commented Sep 18, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify Bot commented Sep 19, 2025

Uh oh!

qandrew commented Sep 21, 2025

Uh oh!

lacora commented Sep 22, 2025

Uh oh!

qandrew commented Sep 22, 2025

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qandrew commented Sep 18, 2025 •

edited by github-actions Bot

Loading